Downtime Prevention Technique: Continuous Deployments
Updated:
This isn't surprising to anyone who's done CI+CD well, and/or has read Accelerate: Small, easy to reason about changes, merged to a single trunk and released immediately to prod on commit === reduced incidents.
What wasn't so obvious to me is one of the reasons this is so: That the more you operate the machinery of releasing, observing and rolling back, the better you get at those things.
Probably intuitive once you've read it. However, I had to go back through several years of post-incident responses and 1:1 notes to really zero in on it. And then once I saw it in data, at least in my own little shop, it was "oh, of course".
How it works is this: That incident you had that might have only been 10 minutes but lingered on to 30 because nobody on the Zoom had the right permissions? That incident maybe still happens - there's nothing magic about CI/CD, mistakes will still be made - but it is only 10 minutes because everybody has the right permissions and knows what to do. Why? Because we do that thing every day, many times a day.